-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-23549][SQL] Cast to timestamp when comparing timestamp with date #20774
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
If this approach is fine, I will create tests for |
|
Test build #88101 has finished for PR 20774 at commit
|
|
cc: @gatorsmile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although this PR makes sense, we still need a SQLConf. Also add this behavior change into the migration guide in the SQL doc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I will add a SQLConf. Is there any suggestion for a name of this entry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it means TimeStamp('2017-03-01 00:00:01') eq Date('2017-03-01') = false? Looks a bit weird.
At least this comparison does not get true in Hive:
hive> select cast('2017-03-01 00:00:00' as timestamp) = cast('2017-03-01' as date);
OK
falseThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your try.
The current implementation follows this answer.
Which version of hive did you use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hive 2.1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
@dongjoon-hyun are you pointing out this HIVE JIRA entry in this comment? Or other HIVE JIRA entries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems it's fixed in Hive 2.2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, but @dongjoon-hyun 's comment said it is fixed at 2.0.
|
Test build #88346 has finished for PR 20774 at commit
|
|
retest this please |
|
Test build #88348 has finished for PR 20774 at commit
|
|
Test build #88357 has finished for PR 20774 at commit
|
|
cc @gatorsmile |
gatorsmile
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I verified it in DB2. It works as what this PR proposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private def findCommonTypeForBinaryComparison(
dt1: DataType, dt2: DataType, conf: SQLConf): Option[DataType] = (dt1, dt2) match {There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two more spaces before if
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test("binary comparison with string promotion") {
val rule = TypeCoercion.PromoteStrings(conf)
ruleTest(rule,
GreaterThan(Literal("123"), Literal(1)),
GreaterThan(Cast(Literal("123"), IntegerType), Literal(1)))
ruleTest(rule,
LessThan(Literal(true), Literal("123")),
LessThan(Literal(true), Cast(Literal("123"), BooleanType)))
ruleTest(rule,
EqualTo(Literal(Array(1, 2)), Literal("123")),
EqualTo(Literal(Array(1, 2)), Literal("123")))
ruleTest(rule,
GreaterThan(Literal("1.5"), Literal(BigDecimal("0.5"))),
GreaterThan(Cast(Literal("1.5"), DoubleType), Cast(Literal(BigDecimal("0.5")),
DoubleType)))
Seq(true, false).foreach { convertToTS =>
withSQLConf("spark.sql.hive.compareDateTimestampInTimestamp" -> convertToTS.toString) {
val date0301 = Literal(java.sql.Date.valueOf("2017-03-01"))
val timestamp0301000000 = Literal(Timestamp.valueOf("2017-03-01 00:00:00"))
val timestamp0301000001 = Literal(Timestamp.valueOf("2017-03-01 00:00:01"))
if (convertToTS) {
// `Date` should be treated as timestamp at 00:00:00 See SPARK-23549
ruleTest(rule, EqualTo(date0301, timestamp0301000000),
EqualTo(Cast(date0301, TimestampType), timestamp0301000000))
ruleTest(rule, LessThan(date0301, timestamp0301000001),
LessThan(Cast(date0301, TimestampType), timestamp0301000001))
} else {
ruleTest(rule, LessThan(date0301, timestamp0301000000),
LessThan(Cast(date0301, StringType), Cast(timestamp0301000000, StringType)))
ruleTest(rule, LessThan(date0301, timestamp0301000001),
LessThan(Cast(date0301, StringType), Cast(timestamp0301000001, StringType)))
}
}
}
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little hard to track the exact changes. Let us move all the new tests to the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 it is really confusing to look at the diff
docs/sql-programming-guide.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need to add it to migration guide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should not be under hive, because it is not only for hive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spark.sql.typeCoercion.compareDateTimestampInTimestamp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
internal()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps mention this config will be removed in spark 3.0.
(on a related note we should look at those configs for backward compatibility and consider removing them in 3.0)
|
Test build #88403 has finished for PR 20774 at commit
|
|
Test build #88412 has finished for PR 20774 at commit
|
|
Retest this please |
|
Test build #88425 has finished for PR 20774 at commit
|
|
Retest this please |
1 similar comment
|
Retest this please |
|
Test build #88433 has finished for PR 20774 at commit
|
|
Test build #88434 has finished for PR 20774 at commit
|
|
Test build #88574 has finished for PR 20774 at commit
|
|
LGTM Thanks! Merged to master |
What changes were proposed in this pull request?
This PR fixes an incorrect comparison in SQL between timestamp and date. This is because both of them are casted to
stringand then are compared lexicographically. This implementation showsfalseregarding this queryspark.sql("select cast('2017-03-01 00:00:00' as timestamp) between cast('2017-02-28' as date) and cast('2017-03-01' as date)").show.This PR shows
truefor this query by castingdate("2017-03-01")totimestamp("2017-03-01 00:00:00").(Please fill in changes proposed in this fix)
How was this patch tested?
Added new UTs to
TypeCoercionSuite.